|
The Lemur Project is a collaboration between the Center for Intelligent Information Retrieval at the University of Massachusetts Amherst and the Language Technologies Institute at Carnegie Mellon University. It develops the Lemur Toolkit, an open-source (BSD license) software framework for building language modeling and information retrieval software, and the ''Indri'' search engine. This toolkit is used for developing search engines, text analysis tools, browser toolbars, and data resources in the area of IR. The programming languages used to create Lemur are C and C++ and it comes along with the source files and a make file. The provided source code can be modified for the purpose of developing new libraries. It is compatible with various operating systems which include UNIX (Linux and Solaris) and Windows XP. == Features == Lemur supports the following features: * Indexing: * * English, Chinese, and Arabic text * * Word stemming * * Stop words * * Tokenization * * Passage and incremental indexing * Retrieval: * * Ad hoc retrieval (TF-IDF and InQuery) * * Passage and cross-lingual retrieval * * Language modeling * * * Query model updating * * * Two stage smoothing * * Relevance feedback * * Structured query language * * Wildcard term matching * Distributed IR: * * Query-based sampling * * Database based ranking (CORI) * * Results merging * Document clustering * Summarization * Simple text processing 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Lemur Project」の詳細全文を読む スポンサード リンク
|